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ABSTRACT 

PortEco (http://porteco.org) aims to collect, curate 
and provide data and analysis tools to support basic 
biological research in Escherichia coli (and eventu- 
ally other bacterial systems). PortEco is imple- 
mented as a 'virtual' model organism database 
that provides a single unified interface to the user, 
while integrating information from a variety of 
sources. The main focus of PortEco is to enable 
broad use of the growing number of high-through- 
put experiments available for E. coli, and to leverage 
community annotation through the EcoliWiki and 
GONUTS systems. Currently, PortEco includes 
curated data from hundreds of genome-wide RNA 
expression studies, from high-throughput pheno- 
typing of single-gene knockouts under hundreds of 
annotated conditions, from chromatin immunopre- 
cipitation experiments for tens of different 
DNA-binding factors and from ribosome profiling 
experiments that yield insights into protein expres- 
sion. Conditions have been annotated with a 
consistent vocabulary, and data have been consist- 
ently normalized to enable users to find, compare 
and interpret relevant experiments. PortEco 
includes tools for data analysis, including clustering, 
enrichment analysis and exploration via genome 
browsers. PortEco search and data analysis tools 
are extensively linked to the curated gene, meta- 
bolic pathway and regulation content at its sister 
site, EcoCyc. 



INTRODUCTION 

The central role of Escherichia coli research in the history 
of molecular genetics, systems biology and synthetic 
biology make the data generated from E. coli important 
not only for this model organism, but also for bacteria in 
general, including environmental sequencing and human 
microbiome studies. High-throughput molecular biology 
technologies are transforming biological research, 
making it possible to probe the detailed systems responses 
of organisms to perturbations in their genetics or environ- 
ment. A large number of such data sets have been, and 
continue to be, collected for E. coli, one of the best-studied 
bacterial model organisms. PortEco (http://porteco.org) is 
a data resource that provides access to data and tools to 
allow users to efficiently find and integrate information 
from more than half a century of basic research on labora- 
tory E. coli, its phages, plasmids and mobile genetic 
elements. 

PortEco's mission is to support bacterial research, by 
facihtating access to the massive (and continually 
growing) volume of experimental data for E. coli, and 
eventually other bacterial model systems. Making these 
data truly accessible requires both data handhng — collec- 
tion, consistent and updated processing, curation (e.g. 
creation of accurate data descriptions) — and databases 
and intuitive software for users to find and analyze 
existing data to help pose or answer novel research ques- 
tions. PortEco is designed to be a 'central point of access' 
for such data, but it does not seek to reinvent the wheel. 
EcoCyc (1) already provides curated, review-level data for 
E. coli genes, metabolic pathways and, in collaboration 
with RegulonDB (2), operons and gene regulatory inter- 
actions. PortEco, by contrast, focuses on high-throughput 
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experimental data and analysis tools, described in detail 
below, as well as covering genetics data and information 
about E. coli plasmids and phage. PortEco, through 
EcoliWiki (3) and GONUTS (4), also provides community 
input for a variety of areas. As researchers need to quickly 
navigate between all of these data sources, PortEco and 
EcoCyc have extensive reciprocal links, and the PortEco 
integrated search simultaneously searches PortEco and 
EcoCyc, as well as other, more specialized, data resources. 
Together, these resources create a more complete and 
powerful solution for the needs of researchers using 
E. coli as a model system for microbiology and molecular 
biology, for biotechnology, or as a platform for systems 
and synthetic biology. 



INTEGRATED SEARCH 

The PortEco search is designed to be a 'one-stop' search 
for information about E. coli. Searches are 'comprehen- 
sive', including not only PortEco data sources, but also 
other databases with E. coli information. On the technical 
side, the searches of all different data sources are carried 
out simultaneously via web services, and the results page is 
continually updated as search results arrive from each 
source using AJAX (asynchronous Javascript and 
XML). Currently, PortEco searches 16 different data 
sources (Table 1), and new resources that support web 
services-based queries can easily be added. By default, 
the search displays all results from each resource that 
are associated with the search term. However, the 
PortEco search is also 'context-sensitive': it automatically 
detects if the user has entered a gene name or synonym, 
and filters and formats the results into a 'gene view.' The 
gene view displays only those results obtained for the 
specific gene, and performs additional queries for more 
detailed information about that gene (Table 2). Users 
can still view the 'full results' even for a gene query, by 
clicking on the 'view full results' button at the top of the 
gene view. 



PORTECO DATA: COLLECTION, PROCESSING 
AND CURATION 

PortEco is collecting, processing and curating data from 
experiments in E. coli. These data types currently include 
the following: 

• Genome-scale mRNA expression data 

• Alleles and phenotype data for E. coli mutant strains 
from both curated articles and genome-scale growth 
experiments 

• Genomic features of E. coli plasmids and phage 

• Genome-scale protein-DNA interactions from chro- 
matin immunoprecipitation (ChIP) experiments and 
genomic SELEX experiments 

• Genome-scale ribosome profiling data 

• An interactive, community-editable E. coli strain 
genealogy 

• Gene Ontology (GO) annotations of gene functions (in 
collaboration with EcoCyc) 

• Gene family trees and orthologs of E. coli genes in 
representative species 

• A corpus of E. coli scientific hterature 



mRNA expression data 

At PortEco, we collect publicly available microarray data, 
with the vast majority of these data being taken from 
Array Express (16) or the Gene Expression Omnibus 
[GEO, (17)]. To allow results from different laboratories 
and different experiment sets to be compared with one 
another, raw data are processed and normalized using a 
standard procedure before being made available at 
PortEco. The processing pipehne includes associating 
each probe on a particular microarray platform with the 
correct genomic coordinates [by remapping probes to the 
current genome sequence, many probes were designed 
before the current version of the sequence (18)], and 
associating those coordinates with the correct gene 
name, and an extensive list of synonyms where 
synonyms exist. This allows data to be retrieved regardless 



Table 1. Resources integrated into PortEco search 



Resource Data Maintained by PortEco 



BioModels (5) Quantitative models for simulating metabolic and regulatory systems No 

EcoCyc (1) Genes, pathways, operons No (sister site) 

EcoGene (6) Genes No 

EcoliWiki (3) Genes, strains, alleles Yes 

GenExpDB (http://genexpdb.ou.edu) Gene expression profiles No 

NCBI (7) Genes No 

PANTHER (8) Gene families and orthologs of E. coli genes Yes 

Pathway Commons (9) Pathways and protein interactions No 

Protein Data Bank (10) Protein 3D structures (experimental) No 

Protein Model Portal (11) Protein 3D structures (both experimental and models) No 

PortEco GBrowse Gene location, ChIP, ribosome profiling, RNA-seq Yes 

PortEco Gene Expression Gene expression profiles; expression conditions Yes 

PortEco Phenotype Knockout phenotype data Yes 

PortEco Textpresso Publications (full text search) Yes 

STRING (12) Predicted interacting genes No 

UniProt (13) Proteins No 



Table 2. Additional information retrieved for PortEco search results for genes 



Additional information 



Thumbnail image of genomic location, links to ChIP, ribosome profiling, RNA-seq 
Gene summary information 

Thumbnail image of most significant differential expression conditions 
Thumbnail image of most significant growth phenotype conditions 
Knockout phenotypes, protein localization, gene essentiality 
Mutant alleles and phenotypes 
Available mutant strains 

Comprehensive lists of subfamily and family members 



of what gene identifier might have been used when the 
data were first deposited. Each experiment (microarray) 
is manually curated: information about growth and treat- 
ment conditions is collected, along with names and geno- 
types of the strains used. The descriptions of experimental 
conditions accompanying publicly deposited microarray 
data are often abbreviated and sometimes incomplete. In 
such cases, we turn to the associated publications and cit- 
ations therein to track down experimental details. When 
necessary, we contact authors for further information. 
In a similar fashion, we try to obtain complete geno- 
types for strain(s) used and, whenever possible, determine 
strain hiieages. Details about strain constructions and 
lineages are entered on strain pages at EcoliWiki (for 
example, <http://ecoliwiki.net/colipedia/index.php/Cate 
gory:Strain:BW251 13> contains information about 
BW25113, the strain background for the Keio knockout 
collection). Another part of the curation process assigns 
each experiment (microarray) to an experimental condi- 
tion category; this allows users to search for microarrays 
that may be related using those categories as queries. We 
are collaborating with the RegulonDB (2) and 
COLOMBOS (19) groups to establish a common set of 
condition terms. 

At PortEco, we currently have data associated with 193 
publications that have been pubhshed over the past 12 
years. These data are normalized, converted to log ratios 
if necessary (for example, single channel Affymetrix data 
are converted to ratio style measurements by using either a 
control array as the denominator, or by using a probe's 
average intensity in the data set as the denominator), and 
then clustered. We note that GenExpDB (http://genexpdb. 
ou.edu/) has some similar functionahty to our expression 
site, also containing expression data for E. coli imported 
from GEO. For a given gene or genes entered into the 
GenExpDB search box, a heatmap can be retrieved for 
those genes' expression across all conditions for which 
they have data available. However, GenExpDB does not 
provide the ability to cluster data for arbitrary genes 
across an arbitrary set of conditions, nor does it 
annotate the conditions with a consistent set of controlled 
vocabulary terms, instead relying on the meta-data 
imported from GEO. In addition, it does not provide a 
means by which to select the most significantly expressed 
genes from any given condition or set of conditions. These 
functionahties are all currently available from PortEco 
(see below). 



Nucleic Acids Research, 2014, Vol. 42, Database issue D679 



Source 



PortEco GBrowse 
EcoCyc 

PortEco Gene Expression 

PortEco Phenotype 

GenoBase (14) 

EcoliWiki 

EcoliWiki 

InterPro (15) 



With the advent of high-throughput sequencing, re- 
searchers now have the ability to not only determine 
with unprecedented detail which parts of the genome are 
actually transcribed, but in addition, can quantify at what 
level they are transcribed over a linear range spanning 5 
orders of magnitude, at least two more orders than 
possible with microarrays (20). While there are few 
RNA-Seq data sets currently available for E. coli, these 
data sets are expected to be generated with increasing fre- 
quency, as fewer experiments are performed using micro- 
array technology. At PortEco, we are developing standard 
pipehnes to take the raw read data (in fastq format), and 
to map these data to the latest version of the genome, 
and to then determine expression values for each gene 
(in rpkm) using the latest genome annotation. As the 
genome sequence is updated, and as the primary annota- 
tion of the genome changes (for example, with newly 
described transcripts), we will be able to reprocess all 
data sets using the same pipehne, to provide consistent 
and comparable results across all RNA-Seq data sets. 

Alleles and phenotypes 

EcoliWiki gene pages contain > 16 000 entries for alleles or 
phenotypes for E. coli genes. These alleles and phenotypes 
are a combination of alleles imported from the records of 
the E. coli Genetic Stock Center (21) and information 
from manual curation of the E. coli genetics literature. 
As part of EcoliWiki, these pages are available for com- 
munity curation. 

Nichols et al. perfonned large-scale determination of 
growth phenotypes for 3979 mutants under 324 conditions 
representing 114 distinct stresses (22). This data set 
provides a rich source of functional insights from com- 
parison of phenotypic profiles between genes and condi- 
tions. PortEco provides two systems for browsing data 
from this study. The original data browser allows users 
to query and browse the fitness data and correlations from 
the authors between strains or conditions. This phenotypic 
profiles data browser, which was hnked in the article, is 
one of the most heavily accessed components of PortEco. 
Integration with EcoHWiki allows the search to recognize 
records by the current gene names and synonyms. In a 
second-generation data browser, we have adapted the 
GeneXplorer system (23) used for expression data to 
allow users to recluster and analyze subsets of the large- 
scale growth phenotypes. This system provides the signifi- 
cant phenotypes section displayed by the PortEco search. 
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Genome scale protein-nucleic acid interaction data 

Visualizing the locations of protein-nucleic acid inter- 
actions in the context of genes and the genome provides 
valuable insights about central dogma processes (replica- 
tion, transcription and translation) and their regulation. 
Experiments of this kind include ChlP-chip (24-31), Chip- 
Seq (29,32) and ribosome profiling (33-35). PortEco 
identifies studies of these kinds in the literature and by 
contact with authors for studies in preparation. Data are 
obtained from repositories or supplemental data or, in 
some cases, by contacting the authors. If necessary, data 
are background corrected, renormalized and converted to 
standard file formats for display in the EcoCyc genome 
browser, our own GBrowse (36,37) instance, and our 
JBrowse (38) test site. The converted data are download- 
able in common file formats (gff, wiggle, bam) for viewing 
in other browsers. 

Information about each data set is associated with an 
EcoliWiki page for the relevant publication. Publications 
associated with browser tracks are placed in EcohWiki 
categories to help users find data sets of interest among 
the growing corpus of experiments, and tracks from each 
publication are annotated in a table that allows the 
complete set of tracks or arbitrary subsets of tracks in 
EcoliWiki to be displayed in other wiki pages. 

The PortEco GBrowse and JBrowse genome browsers 
allow users to view these data in the context of curated 
genomic features. We provide browsers for multiple E. coli 
strains, plasmids and bacteriophage. Default tracks are 
generated from RefSeq and Genbank records, but alter- 
native tracks provide alternative annotations, such as 
operons from RegulonDB, locations of cloned inserts, 
known deletions and other manually curated content. 

E. coli strain genealogies 

In EcoHWiki, PortEco provides community-editable infor- 
mation about >280 strains. Stain information includes 
genotypes, references, construction details and sources 
for obtaining the strain. Strains are arranged in 
genealogies based on their construction. We currently 
include aU of the strains described in the genealogies for 
E. coli K-12 and E. coli B described by Bachmann [in (39) 
and Daegelen et al. (40), respectively]. PortEco also 
supports pathway-genome databases for several E. coli 
strains, aUowing comparison of these strains using 
BioCyc tools. 

GO annotations of gene function 

PortEco and EcoCyc collaborate to maintain and update 
the annotation of E. coli gene function for the GO con- 
sortium (41,42). We regularly aggregate and deposit an 
up-to-date gene annotation file that is downloadable 
from either PortEco or the GO consortium Web site. 
This file is constructed from combining annotations 
from UniProt with the professionally curated GO annota- 
tions from EcoCyc and community annotations from 
EcoliWiki and GONUTS, which provides a community 
GO annotation system for any protein in UniProt. 



Orthologs and gene family trees 

E. coli gene families and phylogenetic trees are generated 
and curated in collaboration with the PANTHER 
database (8). Currently, 2657 genes (64% of protein- 
coding genes) have been placed in phylogenetic trees. 
'Strict' orthologs (i.e. genes related by vertical descent 
from a common ancestor) are computed from these trees 
in 81 other organisms (hsted at http://pantherdb.org/ 
panther/summaryStats.jsp). Hidden Markov models are 
created for both families and subfamihes, to allow 
searching for related genes in other genomes. These 
Hidden Markov models are run regularly on the 
UniProt database as part of the InterPro project (15), so 
users can navigate to comprehensive lists of related genes. 

E. coli literature 

EcohWiki contains wiki pages for >25 000 publications. 
These pages allow community-editable addition of notes 
and discussion, links to other PortEco content and data 
tables for data mining, such as the track information 
tables described above. Articles covered in EcohWiki are 
used to automatically update the literature corpus for full- 
text indexing by the PortEco instance of Textpresso (43), 
which has been modified to provide a more user-friendly 
interface and to provide a web service to provide relevant 
articles to the integrated PortEco search. 

EcoliHouse: database access to gene information 

EcohHouse is a database warehouse containing multiple 
E. coli databases. EcohHouse serves two purposes within 
PortEco. First, it is a publicly queryable MySQL database 
that allows scientists to issue SQL queries across multiple 
E. coli databases. Second, it is the database to which the 
PortEco web-based multigene query system sends queries 
to access the EcoCyc and EcoGene databases. The data- 
bases currently present within EcohHouse are EcoCyc, 
EcoGene, Eco2Dbase, the UniProt complete proteome 
for E. coli K-12, the RefSeq E. coli K-12 MG1655 
genome entry, and the Genbank E. coli K-12 MG1655 
genome entry and several E. coli ChlP-chip data sets. 
See http://biowarehouse.ai.sri.com/EcoliHouseOverview. 
html for a listing of the current databases within 
EcoliHouse, EcoliHouse access instructions and example 
queries. 

HIGH-THROUGHPUT DATA ANALYSIS 
WORKFLOWS 

PortEco is designed to facilitate retrieval and analysis 
of high-throughput data sets that have been generated 
for E. coli (Figure 1). There are three starting points for 
accessing E. coli data in PortEco: (i) search for a specific 
gene, (ii) search for a specific set of experimental condi- 
tions (for either gene expression or growth phenotype 
data) and (iii) search for a specific set of experiments to 
view in a genome browser. PortEco uses the GeneXplorer 
tool (23) for display of gene expression and knockout 
growth phenotype data, which in PortEco is now seam- 
lessly integrated with analysis tools from the PANTHER 
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Find and analyze 
knockout phenotype or 
gene expression data 



Find data for a specific gene 



Select data subsets 

• Conditions 

• Mutants 

• Strains 



1 



Browse conditions sorted 
by significance relative to 
all conditions in database. 
Select specific conditions 
for further analysis 



Gene set enrichment 
at pantherdb.org 





View data in genome browser 

• Chip 

• Ribosome profiling 
RNA-seq 

Browse and select available 
genome browser tracks 



Select subsets of genes 
with similar profiles 



View data in genome 
browser 



Classes of genes with 
values that deviate 
significantly from 
expectation 



List of genes significantly 
up or down in selected 
experiments 



Gene set 
overrepresentation 
at pantherdb.org 



Pathway view 
atecocyc. org 



Create gene group 
at ecocyc.org 



I S J I i 
I I 



III! 



Classes of genes more 
common in selected 
subset than expected 



Data for selected 
genes overlaid onto 
metabolic pathways 



Explore curated 
information about 
selected genes; 
overrepresentation 
analyses and more 



Figure 1. Main workflows supported for retrieving and analyzing high-throughput data sets at PortEco. There are three entry points (blue text at 
top), and at each intermediate step the user can choose between several different paths for further analysis and exploration. 



and EcoCyc Web sites. PortEco currently uses GBrowse 
(36) as a genome browser, though Jbrowse (44) is cur- 
rently available on a testing site and will be fully 
released in the near future. 

Search for a specific gene 

Searching for a gene name, synonym or accession launches 
the PortEco gene search results view (see 'Integrated 
Search' above). From here, users can cHck on the 
genome browser link to view the genomic context and 
select Chip, ribosome profihng and RNA-seq tracks to 
add to the view. Users will see a thumbnail of conditions 
where mRNA expression of that gene is up- or 
downregulated, and another thumbnail of conditions 
where the knockout of that gene has increased or 
decreased growth rate. Clicking on the link to analyze 
all data (for either expression or growth phenotype) will 
launch the Samples and Conditions view of the 
GeneXplorer tool, allowing the user to (i) browse the con- 
ditions that have the most significantly increased or 
decreased expression or growth, and (ii) select subsets of 
conditions for clustering. This allows users to find genes 
that are correlated with the gene of interest specifically 



under those conditions where the gene of interest shows a 
significant expression change or phenotype. Focusing on 
specific conditions helps to avoid spurious correlations 
driven by the majority of conditions where there is Uttle 
or no effect on the expression or knockout phenotype of 
most genes. Note that because this point of entry provides 
the ability to retrieve data from many unrelated experi- 
ments, the notion of using log ratio data is not necessarily 
applicable as it is when analyzing a coherent data set from 
a single publication. Thus, all data are transformed into 
Z-scores, which indicate, in that experiment, how many 
standard deviations above or below the mean was a par- 
ticular gene's expression or phenotype value. 

The Samples and Conditions view displays a histogram 
with the Z-scores for that gene's expression or phenotype 
data and a list of the experiments where the Z-score for the 
gene is above a user-selected threshold. Once conditions of 
interest have been selected, the data for all genes in those 
conditions can be clustered, and a GeneXplorer window 
then shows global and zoomed 'heatmap' views for the 
clustered data. Within the zoomed view, users can see 
gene names, product descriptions and hnks to resources 
for more information. At this point, users have a number 
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of options. For any particular gene tiiey can get a list of 
other genes with the most highly correlated and anti- 
correlated expression patterns or phenotypic profiles 
across the selected conditions. Subclusters of genes and 
data can be selected and further analyzed in a number 
of ways, including finding overrepresented pathways/ 
processes, viewing in the EcoCyc 'ceUular overview' tool 
or sending to the EcoCyc 'groups' tool (1). 

Search for a specific set of experimental conditions 

Using the 'cluster my genes' tool, users can browse the 
available experiments for selection. As described above, 
the experiments have been classified manually by the 
type of experimental conditions, the strain(s) used, the 
specific mutant (if applicable) and the publication. Users 
can select data sets by any of these criteria, and optionally 
enter a subset of genes (all genes are considered by 
default). They can then (i) retrieve a hst of genes that 
are significantly up or down in the selected experiments 



(based on Z-scores relative to all experiments in the 
database, as described above), (ii) analyze those condi- 
tions for enriched biological pathways/processes or (iii) 
cluster the patterns for different genes under the selected 
conditions. Selected genes and data sets are then retrieved 
and clustered, and displayed using GeneXplorer. Clusters 
can be further analyzed as described above. 

Search for a specific set of experiments to view in a 
genome browser 

Figure 2 illustrates the use of EcohWiki to manage and 
personalize views of track collections for high-throughput 
data. The curation of track data in EcohWiki publication 
pages allows us to generate interactive tables of available 
data sets. These hst the author and publication, the type of 
experiment, a brief description and the strains used. 
Entering a search term will dynamically filter the table 
to include only those entries matching the term (e.g. by 
entering 'ribosome profiling' the table will be reduced to 



B . 

Catego(y:Papers with tracks in Gbrowse 



^ CategoryPapers wll^ 



wllh tracks In Gbrowse 




Category page 
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<pages> 
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List of tracks from 
specific PMIDs 



Figure 2. (A) EcoliWiki pages for publications related to track data include a user-editable table listing relevant track data, with links to genome 
browsers and data files. These pages can be tagged to place them into EcoliWiki Categories. (B) A tag extension on the appropriate Category page 
creates a summary table aggregating all available high-throughput tracks from Category members. (C) The table can be sorted and searched for 
keywords and used to launch GBrowse views with the selected tracks enabled. Custom track lists can be created based on (D) keyword matches or 
(E) lists of PubMed IDs. 
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only those types of experiments). The user can then select 
the data sets to launch in a genome browser. In addition to 
the global listing of data tracks, users can create their own 
custom views of subsets of tracks based on querying the 
global set of browser tracks from high-throughput data. 

PREPUBLICATION SERVICES 

In addition to allowing users to compare their data sets 
with publicly available data sets, users can use Porteco 
tools to create password-protected private views of their 
data. Private views of data that can be visualized as 
genome browser tracks, such as genome-scale protein- 
DNA interactions, ribosome profiling or alternative 
genome annotations, can be created using the custom 
tracks capabilities of GBrowse (36,45). This allows users 
to view their data in the context of other work and existing 
annotations. GBrowse allows users to do this without 
even having to tell PortEco about it. Reviewers can be 
provided with access to these private before publication. 
However, working with PortEco, we can move these tem- 
porary custom tracks into the permanent collection so that 
stable URLs can be included in manuscripts and the data 
can be opened to the public on publication. For example, 
Myers et al. (29) was able to provide Unks for ChlP-chIP 
and ChlP-seq data sets, while Liu et al. (34) used the 
PortEco browser for ribosome profiling data mapped 
against both the E. coli K-12 and bacteriophage lambda 
genomes. 

In other cases, the data of interest is a set of tabular 
data where we can create custom web-based tools to 
analyze and then provide public access. We have con- 
structed a framework to quickly construct access- 
controlled custom views of tabular data. Unlike tabular 
data in Excel or Google Spreadsheets, we can easily 
leverage PortEco so that tables can be searched using 
synonyms for accessions or gene names in the user data 
sets, and links from the tables to PortEco or EcoUWiki 
can be built in more easily than if authors built and main- 
tained their own web interfaces for supplemental data. 
This approach was used to provide data browsers for 
the Nichols et al. (22) phenotypic profile data and the 
analysis of the stress-induced mutagenesis network by 
Al-Mamun et al. (46). As with browser tracks, we can 
provide URLs to the public view of the data to be 
included in publications. These capabilities allow a 
greater subset of the research community to use published 
data in ways that will increase the citation of the articles 
including these hnks. 

CONCLUSION 

PortEco has been designed to leverage and integrate with 
the wealth of bioinformatics data resources that include 
information related to E. coli. Leverage and integration 
are also key to how PortEco combines and extends avail- 
able open-source software. Our two wiki projects leverage 
the broader expertise of the research community and illus- 
trate how MediaWiki can be used to quickly build com- 
munity resources for different kinds of information. In 



this way PortEco provides important content for use by 
researchers using E. coli as a model system, and illustrates 
a virtual model organism database approach to building a 
data resource. 
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